AI4FoodSecurity - A Challenge for Crop Type Classification at Field Level

header

Get started with the challenge

This notebook will get you started with downloading, exploring and analysing the input and output data of the challenge.

The proposed challenge will focus on crop type classification based on a time-series input of Sentinel-1, Sentinel-2 and Planet Fusion data. The challenge will cover two areas of interest, in Germany and South Africa, with high-quality cadastral data on field boundaries and crop types as ground truth input.

The challenge will consist of two tracks:

You can choose to participate in both challenges, or select one of the tracks. However, the evaluation mechanism behind both tracks are the same, as well as the rules and prize catalogue.

This notebook showcases how to download and process the data, but you are free to use any open souce Python library specifically designed to deal with Earth Observation data such as eo-learn. In this notebook, the data is stored as tif images, numpy arrays and geopandas dataframes to facilitate processing operations and torch is preferred for data processing and training. However, you can use any other Python tool of preference to process the provided data.

The notebook also showcases how to generate a valid submission file.

As per challenge rules, the following applies:

Code for the winning solutions will be reviewed to ensure rules have been followed.

The content of the notebook is as follows:

  1. Requirements
  1. Data overview

    1.1. Area of Interest for Brandenburg

    1.2. Data Types for Brandenburg

    1.3. Area of Interest for South Africa

    1.4. Data Types for South Africa

  1. Data processing and ML Training

    2.1. Exploiting Planet Fusion Data

    2.2. Exploiting Sentinel-1 Data

    2.3. Exploiting Sentinel-2 Data

  1. Prepare a Submission

    3.1 South Africa Submission Example

    3.2 Brandenburg Submission Example

0. Requirements

In order to download the data through the provided APIs you would need to create an account for Radiant MLHub and create an API key. After creating your API key, replace YOUR_API_KEY_HERE in the cell below with the value of your API key.

If you cannot use the APIs

Check the challenge description for alternative options to retrieve the data as compressed files. You can also download directly:

You are not required to download the entire dataset to run this notebook, a subset of data is also sufficient.

1. Data Overview

This section gives an overview of all the necessary data to train your model and how to download them.

In general, the data sources to download are the following:

NOTE: It is not mandatory to exploit all of the data sources. The challenge participants are free to choose a subset or a combination of data sources to exploit for model training purposes.

The AOIs chosen for this challenge are in the Brandenburg State of Germany and the Republic of South Africa as detailed below.

Brandenburg-Germany Data

The Brandenburg-Germany data contains following time series in UTM zone 33N (i.e. epsg:32633) from Sentinel-1, Sentinel-2 and Planet Fusion:

South Africa Data

The South-Africa data contains following time series in UTM zone 34S (i.e. epsg:32734) from Sentinel-1, Sentinel-2 and Planet Fusion:

1.1 Area of Interest for Brandenburg

In order to check the AOI for Brandenburg, you can start with the ground-truth files:

After your download, it is supposed to be placed in data folder of this notebook:

In order to explore data types in the ground truth of 2018 (for training), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 2534 entries, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geometry of the field as polygon:

In order to explore data types in the ground truth of 2019 (for the evaluation of challenge), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 2064 entries, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geometry of the field as polygon, as same with the example above. However the column crop_id has always value 0, and crop_name is only No Data, because they are reserved for the evaluation of the challenge. Participants of the challenge can only benefit from the geometry column to make predictions for each field bounded by a polygon:

When you look at the crop labels and crop names in training ground-truths you are supposed to see 9 crop types with the following IDs:

These plant types are not evenly planted in the agricultural fields, so you can observe the distribution of fields with each particular crop as below:

However, if we look at the per hectare distribution of each crop, we will see a different distribution, because some crops seem usually planted in larger areas:

Moreover, if we look at how fragmented the fields for a crop type, by counting the number of fields in different hectare bins:

For Brandenburg, the ground-truths of year 2018 are from tile 18E-242N as can be observed below:

For Brandenburg, the ground-truths of year 2019 are from tile 18E-242N as can be observed below, however as you can notice, only polygons are given to you without crop names and crop IDs, because they are reserved for the evaluation of the challenge.

1.2 Data Types for Brandenburg

The input images are from Sentinel-1, Sentinel-2 and Planet Fusion, with following details:

1.2.1 Explore Planet Fusion Data over Brandenburg

1.2.2 Explore Sentinel-1 Data over Brandenburg

In your training and testing, you can utilize both ascending and descending orbit data. However, in this notebook, we will only demonstrate ascending orbit data. Changing the input_dir of S1Reader in the following cell is sufficient to explore descending orbit data.

1.2.3 Explore Sentinel-2 Data over Brandenburg

Sentinel-2 (S2) can be initialized and called as similar to Sentinel-1 as demonstrated above. The only difference, you need to change the data reader from S1Reader to S2Reader and change the data links accordingly as shown below. If you get an error during the initialization of the S2Reader it might be due to unsufficient memory in your working environment because Sentinel-2 data at data/sentinel-2/s2-utm-33N-18E-242N-2018.zip is about 12GB:

1.3 Area of Interest for South Africa

In order to check the AOI for South Africa, you can start with the ground-truth files:

After your download, it is supposed to be placed in data folder of this notebook:

In order to explore data types in the ground truth at 19E-258N and 19E-259N (for training), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 1715 entries in the first ground-truth and 2436 entries in the second ground-truth, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geometry of the field as polygon:

In order to explore data types in the ground truth at 20E-259N (for the evaluation of challenge), you can run the following cell. It will summarize the geojson data belonging to the AOI, which have 2417 entries, and 5 columns representing field ID, area of the field in square meters, length of the field in meters, crop ID, crop name and the geometry of the field as polygon, as same with the example above. However the column crop_id has always value 0, and crop_name is only No Data, because they are reserved for the evaluation of the challenge. Participants of the challenge can only benefit from the geometry column to make predictions for each field bounded by a polygon:

When you look at the crop labels and crop names in training ground-truths you are supposed to see 5 crop types with the following IDs:

These plant types are not evenly planted in the agricultural fields, so you can observe the distribution of fields with each particular crop as below:

However, if we look at the per hectare distribution of each crop, we will see a different distribution, because some crops seem usually planted in larger areas:

For South Africa, the ground-truths at tiles 19E-258N and 19E-259N as can be observed below:

For South Africa, the ground-truths at tile 20E-259N as can be observed below, however as you can notice, only polygons are given to you without crop names and crop IDs, because they are reserved for the evaluation of challenge.

1.4 Data Types for South Africa

The input images are from Sentinel-1, Sentinel-2 and Planet Fusion, with following details:

1.4.1 Explore Planet Fusion Data over South Africa

1.4.2 Explore Sentinel-1 Data over South Africa

1.4.3 Explore Sentinel-2 Data over South Africa

Please note that some of the true color views of agricultural fields observed above have been occluded with high level of clouds. You are free to use cloud probability mask CLP provided with Sentinel-2 as you can see how to reach it in file notebook/utils/sentinel_2_reader.py of this project.

2. Data Processing and ML Training

This section offers some tips and pointers on how to possibly transform the data in a ML-ready format and build a sample ML model. Implementations here are mainly based on the approach in following paper:

Kondmann, Lukas, et al. (2021), DENETHOR: The DynamicEarthNET dataset for Harmonized, inter-Operable, analysis-Ready, daily crop monitoring

Please note that the below examples about data reading, data augmentation and ML training are based on standalone usage of each data sources (i.e. exploiting only Planet Fusion or Sentinel-2), but you are free to implement any processing pipeline to utilize the fusion/ensemble approaches on data level, feature level or decision level.

If you want to use eo-learn for these tasks, check out the documentation about existing tasks like filtering and pixel sampling here. You can easily implement your own task (as done above) by following this example. Here you can find a collection of examples including land cover and crop-type classification.

2.1 Exploiting Planet Fusion Data

The following example initializes and trains a field-based crop-type classification model utilizing Planet Fusion data over Brandenburg area. However, by changing the type of data reader (i.e. from PlanetReader to S1Reader or S2Reader) and changing the training data directories, you can do the same training for other AOIs with different data sources. The example data readers are given under notebook/utils/ of this project:

The training steps can be also monitored by using Tensorboard, as shown below:

2.2 Exploiting Sentinel 1 Data

The following example initializes and trains a field-based crop-type classification model utilizing Sentinel 1 data over Germany area. However, by changing the type of data reader (i.e. from PlanetReader to S1Reader or S2Reader) and changing the training data directories, you can do the same training for other AOIs with different data sources. The example data readers are given under notebook/utils/ of this project:

2.3 Exploiting Sentinel 2 Data

The following example initializes and trains a field-based crop-type classification model utilizing Sentinel 2 data over South-Africa area. However, by changing the type of data reader (i.e. from PlanetReader to S1Reader or S2Reader) and changing the training data directories, you can do the same training for other AOIs with different data sources. The example data readers are given under notebook/utils/ of this project:

3. Prepare Submission

A valid submission entails generating a JSON file containing four columns, named fid, crop_id, crop_name and crop_prop standing for field ID, predicted crop ID, predicted crop name and the softmax probabilities as an array in the size of number of classes. For instance:

composes a row of a prediction for a field in South Africa, where 5 different classes are provided as:

The following code blocks show how to generate a valid submission using a crop type prediction model:

3.1 South Africa Submission Example

Recall test data directory and test field polygons in order to initialize the data reader. In this example, we are generating a submission file for South Africa. If your submission run below fails due to the missing model or directories, please go back to Exploiting Sentinel-2 Data and run the cells until model initialisation is done.

3.2 Brandenburg Submission Example

Recall test data directory and test field polygons in order to initialize the data reader. In this example, we are generating a submission file for Brandenburg. If your submission run below fails due to the missing model or directories, please go back to Exploiting Sentinel-1 Data and run the cells until model initialisation is done.

NOTE: you can now send the submission for evaluation.

The challenge is organized by Planet, Radiant Earth, TUM, DLR, and the Helmholtz AI, and hosted by European Space Agency.